Independent automatic segmentation by self-learning categorial pronunciation rules

نویسنده

  • Nicole Beringer
چکیده

The goal of this paper is to present a new method to automatically generate pronunciation rules for automatic segmentation of speech the German MAUSER system. MAUSER is an algorithm which generates pronunciation rules independently of any domain dependent training data either by clustering and statistically weighting self-learned rules according to a small set of phonological rules clustered by categories or by re-weighting “seen” phonological rules. By this method we are able to automatically segment cost-effectively large corpora of mainly unprompted speech.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Rule−based Categorial Analysis of Unprompted Speech − a Cross−language Study

In this study, we investigated the influence of language specifics in a cross-language task on the automatic segmentation with a self-learning algorithm for the integration of pronunciation rules. The goal of this paper is to present the linguistic and statistic results of a new method to automatically generate pronunciation rules for automatic segmentation of speech the German MAUSER system. M...

متن کامل

A statistical model for predicting pronunciation

A general statistical model for the prediction of pronunciation given the orthographic transcript or the canonical pronunciation of a spoken utterance is described. The model is based on a Markov process that can be derived from a set of statistically weighted re-write rules. The automatic learning of such re-write rules based on annotated speech data is illustrated. One possible application of...

متن کامل

Independent Automatic Segmentation of Speech by Pronunciation Modeling

In this paper we present an iterative automatic segmentation system which does not require any domain dependent training data. Input to the system is the canonical pronunciation and the speech signal of an utterance to be segmented, as well as a set of phonological pronunciation rules. The output is a string of phonetic labels (SAM−PA[1]) and the corresponding segment boundaries of the speech s...

متن کامل

Regional Pronunciation Variants for Automatic Segmentation

The goal of this paper is to create an extended rule corpus with approximately 2300 phonetic rules which model segmental variation of regional variants of German. The phonetic rules express at a broad-phonetic level phenomena of phonetic reduction in German that occurs within words and across word boundaries. In order to get an improvement in automatic segmentation of regional speech variants, ...

متن کامل

Automatic segmentation and clustering of speech using sparse coding

We investigate the application of sparse coding and dictionary learning to the discovery of sub-word units in speech. The ultimate goal is to generate pronunciation dictionaries that could be used for automatic speech recognition (ASR). A dictionary of sparse coding atoms is trained to code a subset of the TIMIT corpus. Some of the trained units exhibit strong correlation with specific referenc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003